ATLAS has a module for automatic categorization of documents, which provides access to several statistical and distance based algorithms. The categorization module instantiates the configured algorithms with different feature types; furthermore, the module is able to start several algorithms simultaneously and combine the results of each classifier.
The module registers one or more algorithms as OSGI services, according to the configuration settings. ATLAS uses these services for categorization tasks which users initiate.
The plugin com.tetracom.atlas.textmining.categorization.algorithms contains implementations of the different categorization algorithms available in ATLAS.
The file com.tetracom.atlas.textmining.categorization.algorithms.properties contains the configuration settings for the automatic categorization module. The file has the following format:
The class CategorizationAlgorithmsProviderService reads the configuration settings, creates instances of the categorization algorithms, and registers them as OSGI services.
Each categorization algorithm is an instance of the ISpecificAutomaticCategorizationService interface. CategorizationAlgorithmFactory uses three parameters to create ISpecificAutomaticCategorizationService instances - name, feature.type and feature.reduction.
Possible options for the name parameter are:
Possible options for the feature.type parameter are:
Possible options for the feature.reduction parameter are:
Bases on these parameters, CategorizationAlgorithmFactory returns a new CategorizationAlgorithm object, which is constructed with corresponding IFeatureSpaceReducer, ICategoryVectorCreator and IDocsClassifier instances.
Each ISpecificAutomaticCategorizationService generates its own algorithm identifiers of the form algorithmIdentifier = name _ feature.type _ feature.reduction.
This identifier is used to distinguish between different instances of the algorithms in the application.
The following sequence of actions is executed in the CategorizationAlgorithm.createModel method:
The following actions are executed in the CategorizationAlgorithm.useModel method:
ATLAS (Applied Technology for Language-Aided CMS) is a project funded by the European Commission under the CIP ICT Policy Support Programme.